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Abstract. Linearizability is a well-established consistency and correct¬ 
ness criterion for concurrent data types. An important feature of lineariz¬ 
ability is Herlihy and Wing's locality principle, which says that a con¬ 
current system is linearizable if and only if all of its constituent parts 
(so-called objects) are linearizable. This paper presents P-compositionality, 
which generalizes the idea behind the locality principle to operations on 
the same concurrent data type. We implement P-compositionality in a 
novel linearizability checker. Our experiments with over nine implemen¬ 
tations of concurrent sets, including Intel's TBB library, show that our lin¬ 
earizability checker is one order of magnitude faster and/or more space 
efficient than the state-of-the-art algorithm. 


1 Introduction 

Linearizability [1] is a well-established correctness criterion for concurrent data 
types and it corresponds to one of the three desirable properties of a distributed 
system, namely consistency [2]. The intuition behind linearizability is that every 
operation on a concurrent data type is guaranteed to take effect instantaneously 
at some point between its call and return. 

The significance of linearizability for contemporary distributed key/value 
stores has been highlighted recently by the Jepsen project, an extensive case 
study into the correctness of distributed systems.^ Interestingly, Jepsen found 
linearizability bugs in several distributed key/value stores despite the fact that 
they were designed based on formally verified distributed consensus protocols. 
This illustrates that there is often a gap between the design and the implemen¬ 
tation of distributed systems. This gap motivates the study in this paper into 
runtime verification techniques (in the form of so-called linearizability checkers) 
for finding linearizability bugs in a single run of a concurrent system. 

The input to a linearizability checker consists of a sequential specification of 
a data type and a certain partially ordered set of operations, called a history. A 
history represents a single terminating run of a concurrent system. We assume 
that the concurrent system is deadlock-free since there already exist good dead¬ 
lock detection tools. Despite the restriction to single histories, the problem of 
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checking linearizability is NP-complete [3]. This high computational complex¬ 
ity means that writing an efficient linearizability checker is inherently difficult. 
The problem is to find ways of pruning a huge search space: in the worst case, 
its size is 0(N!) where N is the length of fhe run of a concurrent system. 

This paper presents a novel linearizability checker that efficiently prunes the 
search space by partitioning it into independent, faster to solve, subproblems. 
To achieve this, we propose P-compositionality (Definition 6), a new partitioning 
scheme of which Herlihy and Wing's localify principle [1] is an instance. Recall 
that locality says that a concurrent system Q is linearizable if and only if each 
concurrent object in Q is linearizable. The crux of P-compositionality is that 
it generalizes the idea behind the locality principle to operations on the same 
concurrent object. For example, the operations on a concurrent unordered set 
and map are linearizable if and only if the restriction to each key is linearizable. 
This is not a consequence of Herlihy and Wing's locality principle. 

In this paper, we study the pragmatics of P-compositionality through its 
implementation in a novel linearizability checker and experimental evaluation. 
Our implementation is based on Wing and Gong's algorithm (WG algorithm) [4] 
and a recent extension by Lowe [5]. We call Lowe's extension of Wing and 
Gong's algorithm the WGL algorithm. The idea behind the WGL algorithm is 
to prune states that are equivalent to an already seen state. Lowe's experiments 
show that the WGL algorithm can solve a significantly larger number of prob¬ 
lem instances than the WG algorithm. We therefore use the more recent WGL 
algorithm as our starting point. 

Our linearizability checker preserves three practical properties of the algo¬ 
rithms in the WG-family that we deem important. Firstly, our tool is precise, 
i.e., it reports no false alarms. This is particularly significant for evaluating 
large code bases, as effectively shown by the Jepsen project. Secondly, our tool 
takes as input an executable specification of the data type to be checked. This 
significantly simplifies the task of expressing the expected behaviour of a data 
type because one merely writes code, i.e., no expertise in formal modeling is 
required. Finally, our tool can be easily integrated with a range of runtime mon¬ 
itors to generate a history from a run of a concurrenf sysfem. This is essential to 
make it a viable runtime verification technique. 

We experimentally evaluate our linearizability checker using nine different 
implementations of concurrent sets, including Intel's TBB library, as exemplars 
of P-composifionalify. Our experiments show that our linearizability checker 
is at least one order of magnitude faster and / or more space efficient than the 
WGL algorithm. Overall, the results of our work can therefore dramatically 
increase the number of runs fhat can be checked for linearizabilify bugs in a 
given time budget. 

The rest of fhis paper is organized as follows. We first formalize fhe problem 
by recalling familiar concepfs (§ 2). We then present P-compositionality (§ 3) on 
which our decision procedure (§ 4) is based. We implement and experimentally 
evaluate our decision procedure (§ 5). Finally, we discuss related work (§ 6) and 
conclude the paper (§7). 


callj h 


sef.insert(l): true 


^ reti 

set.contains(l): true 
call3 I---! ret3 

sef.remove(l): false 
call2 I---1 ret2 


Fig. 1: A history diagram Hj for the operations on a concurrent set 


2 Background 

We recall familiar concepts that are fundamental to everything that follows. 

Definition 1 (History). Let E = {call, ret} x N. For all natural numbers n in N, 
call„ = (call, n) in E is called a call and ret„ = (ret, n) in E is called a return. The 
invocation of a procedure with input and output arguments is called an operation. An 
object comprises a finite set of such operations. For all e in E, obi{e) and op{e) denote 
the object and operation ofe, respectively. A history is a tuple {H,obj,op) where H is 
a finite sequence of calls and returns, totally ordered by When no ambiguity arises, 

we simply write H for a history. We write \ H\ for the length of H. 

Intuitively, a history H records a particular run of a concurrent system. Us¬ 
ing the implicitly associated functions obj and op, a history H gives relevant 
information on all operations performed at runtime, and the sequence of calls 
and returns in H give the relative points in time at which an operation started 
and completed with respect to other operations. This can be visualized using 
the familiar history diagrams [1], as illustrated next. 

Example 1. Consider a concurrent set with the usual operations: 'insert' adds an 
element to a set, whereas 'remove' does the opposite, and 'contains' checks mem¬ 
bership. The return value indicates the success of the operation. For example, 
'set.remove(l): true' denotes the operation that successfully removed '1' from 
the object 'set', whereas 'sef.remove(l): false' denotes the operation that did 
not modify 'set' because '1' is already not in the set. Then the history diagram 
inFig. 1 can be defined by Hi = (calli,call 2 , reti, ret 2 , calls, ret 3 ) such that, for all 
1 < f < 3, obj{ca\\i) = obj{reti) = 'set', and the following holds: 

- op(calli) = op(reti) = 'insert(l): true', 

- op(call 2 ) = op(ret 2 ) = 'remove(l): false', 

- op(call 3 ) = opjrets) = 'contains(l): true'. 

Note that |Hi| = 6 and the total ordering satisfies, among other con¬ 
straints, reti calls because reti precedes calls in the sequence Hi. 

Henceforth, we draw diagrams as in Fig. 1. Linearizability is ultimately de¬ 
fined in terms of sequential histories, in the following sense: 





Definition 2 (Complete and sequential history). Let e, e' e E and Hbea history. 
If e is a call and e' is a return in H, both are matching whenever e e' and their 
objects and operations are equal, i.e. obj{e) = objle') and op{e) = op{e'). A history 
is called complete if every call has a unique matching return. A complete history is 
called sequential whenever it alternates between matching calls and returns (neces¬ 
sarily starting with a call). 

Example 2. The following history H 2 is sequential: 

remove(l): false insert(l): true contains(l): true 


And so is H 3 that we get when we swap the first two operations in H 2 (al¬ 
though the resulting sequence of operations is not what we would expect from 
a sequential set, as discussed next): 

insert(l): true remove(l): false contains(l): true 


H 3 in Example 2 illustrates that a history can be sequential even though 
it may not satisfy the expected sequential behaviour of the data type. This is 
addressed by the following definition: 

Definition 3 (Specification). A specification, denoted by (p (possibly with a sub¬ 
script), is a unary predicate on sequential histories. 

Example 3. Define (pset to be the specification of a sequential finite set. This means 
that, given a sequential history S according to Definition 2, the predicate (pset{S) 
holds if and only if the input and output of 'insert', 'remove' and 'contains' in 
S are consistent with the operations on a set. For example, (pset{H 2 ) = true, 
whereas (psetiHs) = false for the histories from Example 2. 

Remark 1. In the upcoming decision procedure (§ 4), every cp is an executable 
specification. Informally, this is achieved by 'replaying' all operations in a se¬ 
quential history S in the order in which they appear in S. If in any step the 
output deviates from the expected result, the executable specification returns 
false; otherwise, if it reaches the end of S, it returns true. 

The next definition will be key to answer which calls may be reordered in a 
history in order to satisfy a specification. 

Definition 4 (Happens-before). Given a history H, the happens-before relation is 
defined to be a partial order <h over calls e and e' such that e <h e' whenever e's 
matching return, denoted by ret(e), precedes e' in H, i.e. ret(e) o'- ^0 say that 
two calls e and e' happen concurrently whenever e ft h o' and e' ftn 0 . 

Example 4. For the history in Fig. 1, we get: 

- calli call3 and calh <Hi call3, i.e. calh and call2 happen-before call3; 

- calli ftui call2 and call2 ftni calli, i.e. calli and call2 happen concurrently. 








Note that a history H is sequential if and only if <h is a total order. More 
generally, <h is an interval order [6]: for every x,y,u,v in H, if x <h y and 
u V, then x <h v or u <h y- Observe that a partial order (P, <) is an inter¬ 
val order if and only if no restriction of (P, <) is isomorphic to the following 
Hasse diagram [7]: 


Put differently, this paper is about a decision procedure (§ 4) that concerns a 
certain class of partial orders. The decision problem rests on the next definition: 

Definition 5 (Linearizability). Let (phe a specification. A (p-sequential history is 
a sequential history H that satisfies (p{H). A history H is linearizable with respect 
to (p if it can be extended to a complete history H' (by appending zero or more returns) 
and there is a (p-sequential history S with the same obj and op functions as H' such that 

LI H' and S are equal when seen as two sets of calls and returns; 

L2 Q <s, i-C-for all calls e, e' in H, ife happens-before e', the same is true in S. 

Informally, extending H to H' means that all pending operations have com¬ 
pleted. This paper therefore considers only complete histories. This is fully jus¬ 
tified under our stated assumption (§1) that the concurrent system is deadlock- 
free [5]. Condition LI means that H' and S are identical if we disregard the 
order in which calls and returns occur in both sequences. Condition L2 says 
that the happens-before relation between calls in H must be preserved in S. 

Example 5. Recall Example 3. Then in Fig. 1 is linearizable with respect to cpset 
because H 2 is a witness for a (|)s£.f-sequential history that respects the happens- 
before relation <Hi detailed in Example 4. In particular, calh call 3 and 
calb <Hi calls cannot be reordered. 

3 P-compositionality 

In this section, we introduce P-compositionality. We illustrate our new parti¬ 
tioning scheme in Examples 7-9. 

Definition 6 (P-compositionality). Let P be a function that maps a history Li to 
a non-trivial partition of H, i.e. P satisfies P{H) {H}. A specification (p is called 
P-compositional whenever any history H is linearizable with respect to (p if and only 
if, for every history H' e P{H), H' is linearizable with respect to cp. When this equiv¬ 
alence holds we speak of P-compositionality. 

In the following examples, we assume that the partitions are non-trivial. 
The first example illustrates that the locality principle [1] is an instance of P- 
compositionality. 


Example 6. Denote with Obj the set of objects. Let cp he a specification for all 
objects in Obj. Let Pq^j be the function that maps every history H to the set of 
histories EL where each sub-history H' ^ EL is the restriction of EL to an object 
in Obj. Then Pobj{H) is a partition of H. By the locality principle [1], a history 
H is linearizable with respect to (p if and only if, for all € Pobji^)/ ^obj is 
linearizable with respect to (p. Therefore (pis a Poby-compositional specification. 

The remaining examples show that P-compositionality strictly generalizes 
the locality principle because P-compositionality can partition a history even 
if the implementation details or constituent parts (i.e. objects) of a concurrent 
system are unknown. For example, there are at least eight different implemen¬ 
tations of concurrent sets (Table 2), but we do not need to know the objects 
(e.g. registers, buckets) of which such implementations consist in order to par¬ 
tition one of their histories. This is in contrast to the locality principle where 
such knowledge is required. Put differently, P-compositionality is all about the 
interface of a concurrent data type, whereas the locality principle hinges on the 
implementation details of such an interface. 

Example 7. Reconsider (pset, the specification of a set from Example 3, where all 
operations have the form insert(fc), remove(A:) and contains(fc) for some k. Let 
Pset be the frmction that partitions every history H according to such k. Since the 
'insert', 'remove' and 'contains' operations on a single set object are linearizable 
if and only if the restriction to each k is linearizable, fset is a Pset-compositional 
specification of a set. 

Similarly, there exists a Pmap-cornpositional specification for concurrent un¬ 
ordered maps where every history is partitioned by each key k. 

Example 8. Consider a concurrent array. As their sequential counterparts, a con¬ 
current array can be only read or written at a particular array index. Let Parray 
be the function that partitions a history based on such array indexes. This gives 
a Pflrray-compositional specification of an array. 

Example 9. Consider a concurrent stack where each pop and push operation 
also returns the height of the stack before it is modified. Among other things, 
the return value can be used to determine whether the operation has succeeded. 
For example, if stack.pop returns zero, we know the pop operation was unsuc¬ 
cessful (and the popped element is undefined) because the stack was empty at 
the time the operation was called. We can use the returned height to partition a 
history such that a concurrent stack is linearizable if and only if each partition 
is linearizable. This way we get a Pgfa^jt'Compositional specification of a stack. 

Intuitively, the reason why the previous specifications are P-compositional 
is because all operations in one partition are, informally speaking, unaffected 
by all operations in every other partition. For example, the return value of 
sef.insert(fc) is unaffected by sef.insert(A:'), sef.remove(A:') and sef.contains(fc') for 
k yL k'. This clearly, however, has its limitations. For example, a 'size' operation 
that returns the number of elements in a concurrent collection data type cannot 
be generally partitioned this way. 


Note that all these examples have in common that their P-compositional 
specifications can be expressed as a conjunction of specifications that each par¬ 
tition a history. For example, (pset = /\keK ‘Pset{k) where cpset(k) for every fc is a 
sequential specification that only concerns operations on k, e.g. set.insert(fc). 

Next, we show how to leverage the concept of P-compositionality to more 
efficiently find linearizability bugs. 

4 Decision procedure 

In this section, we explain our linearizability checking algorithm that decides 
whether a history is linearizable with respect to some P-compositional spec¬ 
ification (Definition 6). The novelty of our decision procedure is Algorithm 3 
that leverages P-compositionality. In the next section (§ 5), we experimentally 
evaluate the effectiveness of Algorithm 3. 

Since we base our work on the WGL algorithm (recall § 1), we use the fol¬ 
lowing data structures to represent the input to the decision procedure: 

1. The specification (Definition 3) is modelled by a persistent data structure, 
e.g. [8]. Most standard data types in functional programming languages can 
be almost directly used this way. For instance, the specification of a set can 
be modelled through an immutable sequential set. 

2. A history (Definition 1), in turn, is represented by a doubly-linked list of so- 
called entries. Consequently, each entry e has a e.next and e.prev field that 
point to the next and previous entry, respectively. In addition, each entry e 
has a match field, and we say that e is a call entry exactly if e. match ^ null; 
otherwise, e is called a return entry. Given a call entry e, e. match corre¬ 
sponds to the matching return entry of e. This linked-list data structure 
therefore aligns directly with the usual definition of history (Definition 1). 

The idea behind the WGL Algorithm 1 is threefold: it keeps track of provi¬ 
sionally linearized call entries in a stack; it uses the stack to backtrack if nec¬ 
essary, and caches already seen configurations. We briefly explain each idea in 
turn. Denote the stack of call entries by calls. Given a history H, the height of 
calls is at most half of H's length, i.e. |calls| < 0.5 x |H| = N. Note that there 
is no rounding involved because |H| is always even since every call entry has 
a matching return entry. The height of the stack grows only if a call entry can 
be linearized (line 5). When the stack grows or shrinks, the history is modi¬ 
fied (lines 13 and 23) by the Lift and Unlift procedures (Algorithm 2). We 
remark that the workings of both procedures are illustrated by Example 10. If 
no further call entries can be linearized but the stack is nonempty, the algorithm 
backtracks and tries the next possible call entry (lines 18-24). The backtracking 
points depend on the return value of apply {entry, s) and the cache. The former 
(line 3) models the specification rp: by Remark 1, it determines whether entry can 
be applied to the current state s of a persistent data type. The latter (lines 4-8) 
is an optimization due to Lowe [5] that prunes the search space by memoiz- 
ing already seen configurations which are known to be non-linearizable. More 


Algorithm 1 WGL linearizability checker [5] 

Require: head_entry is such that head.entry.next points to the beginning of history H. 
Require: N = 0.5 x |H| is half of the total number of entries reachable from head_entry. 
Require: linearized is a bitset (array of bits) such that linearized [k] = 0 for all 0 < fc < N. 
Require: For all entries e in H,0 < entry_id{e) < N. 

Require: For all entries e and e' in H, if entryJd{e) = entryJd{e'), then e = e'. 

Require: cache is an empty set and calls is an empty stack. 

1 

while head.entry.next yt null do 


2 

if entry.match null then 

> Is call entry? 

3 

(is.linearizable, s') ^ apply (entry, s 

) > Simulate entry's operation 

4 

cache' cache 

> Copy set 

5 

if is_linearizable then 


6 

linearized' •(— linearized 

> Copy bitset 

7 

linearized'[eMfri/Jd(entry)] 1 

[> Insert entryJd(entry) into bitset 

8 

cache ^ cache U {(linearized'. 

>') } > Update configuration cache 

9 

if cache' cache then 


10 

calls pMsli(calls, (entry, s)) > 

Provisionally linearize call entry and state 

11 

s s' 

> Update state of persistent data type 

12 

linearized[eMfryJd(entry)] 1 

> Keep track of linearized entries 

13 

LlFT(entry) > Provisionally remove the entry from the history 

14 

entry <— head.entry.next 

> Continue search in shortened history 

15 

else 

> Cannot linearize call entry 

16 

entry <— entry.next 

> Continue search in unmodified history 

17 

else 

> Handle "return entry" 

18 

if isjempty(ca\\s) then 

19 

return false 

> Cannot linearize entries in history 

20 

(entry, s) ■<— top(calls) 

> Revert to earlier state 

21 

linearized[eMtryJd(entry)] ■(— 0 


22 

calls <— pop (cal Is) 


23 

UNLIFT(entry) 

> Undo provisional linearization 

24 

entry <— entry.next 


25 

return true 



accurately, each configuration is a pair that consists of a set of unique call en¬ 
try identifiers and a state of the persistent data structure. The intuition behind 
pruning already seen configurations is that only one of two permutations of 
operations on a concurrent data type need to be considered if they lead to an 
identical state [5]. We remark that the total correctness of the WGL algorithm 
follows from Wing and Gong's total correctness argument [4]. 

Example 10. We illustrate the handling of entries in the history data structure. 
For this, consider the two histories in Fig. 2. In Fig. 2a, the entries satisfy the 
following: call 2 .prev = calli, call 2 .next = call 3 and call 2 .match = ret 2 etc. Then 
LlFT(call 2 ) (Algorithm 2) produces the history shown in Fig. 2b. Note that both 
call 2 and ret 2 are still valid entry pointers whose fields remain unchanged. This 
explains how UNLlFT(call 2 ) reverts the change in constant-time. 






Algorithm 3 gives our partitioning scheme. This is an iterative algorithm 
that, given an entry in a history H and positive integer n, partitions H starting 
from that entry into at most n separate sub-histories. The partitioning is con¬ 
trolled by the function partition : E —?■ N from the set of call and return entries 
to the natural numbers. 

Example 11. Consider the history in Fig. 2b. For all entries e in this history, 
let partition{e) = k where k is the integer argument of the operation. For ex¬ 
ample, partition{ca\\^) = partition(ret^) = 1 because op(call 3 ) = op(ret 3 ) = 
'remove(l): false'. Then the function PARTITION(calh) returns two disjoint sub¬ 
histories for the operations on '0' and '1', respectively: 

sef.insert(O): true sef.remove(l): false 

calli I-1 reti and call 3 I-1 ret 3 . 

sef.contains(O): true 
call2 I-1 ret2 

Given a nonempty set of disjoint sub-histories returned by the PARTITION 
function (Algorithm 3), we invoke Algorithm 1 on each sub-history. It is not too 
difficult to implement sub-histories such that there is no sharing between them, 
and Algorithm 1 could be therefore run in parallel for each sub-history. Never¬ 
theless, this addresses a challenging problem that was identified independently 
by Lowe [5] and Kingsbury [9]. 

Theorem 1. Let cpbea P-compositional specification and Hbea history. Denote with 
head.entry the entry that represents the beginning of H. Associate with each disjoint 
history Hjf in partition P{H) a unique number 0 < k < |P(H)| = n. If, for all 
Hjt e P{H) ^nd e e H*-, partitionje) = k, then H is linearizable with respect to (p if 
and only if Algorithm 1 returns true for every history in PARTlTlON(head_entry, n). 

We next experimentally quantify the benefits of the previous theorem. 

5 Implementation and experiments 

In this section, we discuss and experimentally evaluate our implementation of 
the decision procedure (§ 5). As an exemplar of P-compositionality, our experi¬ 
ments use Intel's TBB library and Lowe's implementations of concurrent sets. 


sef.insert(O): true 
calli I---1 reti 

sff.contains{0): true 
call2 I-1 ret2 


sef.insert(O): true 
calli I---1 reti 


s^t.remove(l): false 
call3 I---1 ret3 


sef.remove{l): false 
call3 I--1 ret3 


(a) 


(b) 


Fig.2: After calling LlFT(call 2 ) in history (2a), we get the history in (2b). 
LJNLlFT(call 2 ) reverts this change in constant-time. 










Algorithm 2 History modifications 

Algorithm 3 History partitioner 

1 

procedure LlFT(entry) 

Require: w is a positive integer 

2 

entry.prev.next <— entry.next 

Require: entries is an array of size n 

3 

entry.next.prev <— entry.prev 

1 

function PARTITION(entry, n) 

4 

match entry.match 

2 

for 0 < i < « do 

5 

match.prev.next match.next 

3 

entries];] <— null 

6 

if match.next null then 

4 

while entry y4 null do 

7 

match.next.prev <— match.prev 

5 

/ ■<— partition{entry) mod n 

8 


6 

if entries];] y^ null then 

9 

procedure UNLlFT(entry) 

7 

entries];'].next <— entry 

10 

match ■<r- entry.match 

8 

next_entry ■<— entry.next 

11 

match.prev.next <— match 

9 

entry.prev ■<— entries];] 

12 

if match.next null then 

10 

entry, next <— null 

13 

match.next.prev match 

11 

entries];'] <— entry 

14 

entry.prev.next <— entry 

12 

entry <— next.entry 

15 

entry.next.prev entry 

13 

return entries 


5.1 Implementation 

The implementation details of an NP-complete decision procedure matter, es¬ 
pecially for our experimental evaluation of P-compositionality. We particularly 
consider hashing and cache eviction options because these were not studied in 
previous implementations of the WG-based algorithms [4,5]. 

For experimental robustness, we implemented our linearizability checker 
in C++11 [10] because this language has built-in concurrency support while al¬ 
lowing us to rule out interference from managed runtime environments (e.g. 
JVM) due to garbage collection etc. The choice of language, though, meant that 
we had to implement persistent data structures from scratch. In doing so, we 
focused on optimizing equality checks for our specific purposes. This way, we 
managed to avoid a known performance bottleneck in Lowe's implementation 
of the WGL algorithm [5] where the cost of equality checks had to be compen¬ 
sated with an additional union-find data structure. Another optimization in our 
implementation is a constant-time (instead of linear-time) hash function for bit- 
sets where we exploit the fact that the bitwise XOR operator over fixed-size bit 
vectors forms an abelian group. This optimization turns out to be important 
when histories are longer than 8K, cf. [5]. To see this, consider the computa¬ 
tional steps for retrieving a configuration from the cache and updating it (line 8 
in Algorithm 1). For example, a history of length 2^^ means that each bitset 
in a configuration is at least 3KiB, and so a constant-time hash function can 
make a measurable difference when the cache is frequently accessed. In fact, 
it is not uncommon for the cache to contain more than 27 K of such configu¬ 
rations. For this reason, we also implemented a least recently used (LRU) cache 
eviction feature that can optionally be enabled at compile-time. The effects of 
the LRU cache will be evaluated shortly. 










Benchmark 

WGL 

Time| Memory | Timeout 

WGL+LRU 

Time | Memory | Timeout 

WGL+P 

Time | Memory | Timeout 

TBB 

101 s| 9792MiB| 

0% 

lls|670MiB| 

0%| 

6s| 672MiB| 

0% 

CRLSL 

20s|15738MiB| 

0% 

25s|678MiB| 

0%| 

6s|400MiB| 

0% 

CRLFSL 

14s|l5029MiB| 

0% 

18s|678MiB| 

0%| 

5s|401 MiB| 

0% 

FGL 

16s|l4297MiB| 

0% 

81s|678MiB| 

0%| 

5s|401 MiB| 

0% 

LLL 

23s|l6494MiB| 

0% 

94s| 678MiB| 

0%| 

6s|401 MiB| 

0% 

LSL 

20s|15736MiB| 

0% 

25s|678MiB| 

14% 1 

6s|401 MiB| 

0% 

LFLL 

lls|ll847MiB| 

0% 

15s|678MiB| 

0%| 

5s|402MiB| 

0% 

LFSL 

14 s| 14712 MiB| 

0% 

18s|678MiB| 

0%| 

5s|401 MiB| 

0% 

LFSLFO 

14s|l3125MiB| 

0% 

18s|678MiB| 

0%| 

5s|402MiB| 

0% 

LFSLFl 

<ls| 404MiB| 

0% 

<ls| 407MiB| 

0%| 

<ls| 402MiB| 

0% 

OPTIMIST 

16s|l3818MiB| 

0% 

54s| 678MiB| 

9%| 

5s|401 MiB| 

0% 


Table 1: Experimental results for three variants of the same linearizability 
checker. The results for the baseline are reported in the WGL column. The rows 
correspond to benchmarks drawn from Intel's TBB library and Lowe's imple¬ 
mentations of concurrent sets (see Table 2 for mnemonics). 


Overall, our implementation and experimental setup is around 4 K lines of 
code, including several dozen unit tests. All the code and benchmarks are pub¬ 
licly available in our source code repository.^ 


5.2 TBB and concurrent set experiments 

For the experimental evaluation of our partitioning scheme, we collected over 
700 histories from nine different implementations of concurrent sets by Lowe [5] 
and the concurrent unordered set implementation in Intel's TBB library.^ We 
performed all experiments on a 64-bit machine running GNU/Linux 3.17 with 
12 Intel Xeon 2.4 GHz cores and 94 GB of main memory. 

Each history is generated by rrmning 4 concurrent threads that pseudo ran¬ 
domly invoke operations on a single shared concurrent set. The argument of 
each operation is a pseudo random uniformly distributed integer between 0 
(inclusive) and 24 (exclusive). Each thread invokes 70 K such operations. Note 
that this is significantly more than in previous experiments where each process 
is limited to 2^^ w 8K operations [5]. In total, since every call generates a pair 
of entries, every history H in our benchmarks has length |H| = 4x2x70K = 
560 K. We discuss the experimental results using Intel's TBB library and Lowe's 
concurrent set implementations in turn. 


^ https://github.com/ahorn/linearizability-checker 
^ https://www.threadingbuildingblocks.org/ 


























The experimental results are given in Table 1 . Each of the three main columns 
corresponds to one variant of the same linearizability checker: 'WGU is the 
baseline, 'WGL+LRU' is the WGL algorithm with LRU cache eviction enabled 
(§ 5.1), and 'WGL+P' is the WGL algorithm combined with our partitioning al¬ 
gorithm (Algorithm 3 in § 4). We tried to use the WG algorithm [4] without the 
extension by Lowe [5] but WG times out on the majority of benchmarks. We 
therefore do nof report the results on the WG algorithm and focus on WGL, 
WGL+LRU and WGL+R The meaning of fhe sub-columns is as follows. The 
Time' and 'Memory' columns give fhe average of fhe elapsed time and vir¬ 
tual memory usage, respectively. These averages exclude runs that we had to 
terminate after 1 hour. The percentage of such terminafed runs is given in fhe 
'Timeouf' column. In each row, all variants are compared with respect to the 
same benchmark data. We therefore do nof report confidence intervals. 

The TBB benchmark corresponds to the first row in Table 1 and consists of 
a fofal of 100 histories. Table 1 clearly shows that the WGL+P algorithm is at 
least one order of magnitude faster compared to the baseline. We also see that 
enabling the LRU cache eviction decreases the memory footprint by at least one 
order of magnitude, approximately lOGiB versus 700 MiB. In fact, the runtime 
performance of WGL+LRU is almost one order of magnifude fasfer than the 
baseline. The WGL+P algorithm is at least as fast and almost as space efficient 
as WGL+LRU. In the experiments with Lowe's implementations of concurrent 
sets (see next paragraph), we further investigate the effect of the LRU cache 
eviction feature and how it compares to the partitioning scheme. 

We give Lowe's implementations of concurrent sets mnemonics (Table 2) 
that identify the remaining ten benchmarks in Table 1. Each of these ten bench¬ 
marks comprises between 50 and 100 histories with an average of 70 hisfories 
per benchmark. To avoid bias, we collected these using Lowe's tool. The signif¬ 
icance of the experimental results in Table 1 is twofold. Eirsfly, fhey show thaf 
on average, WGL+P is three times faster than WGL, and WGL+P consumes one 
order of magnitude less space than WGL. Secondly, and more crucially, how¬ 
ever, these experiments reveal that WGL+LRU is not as efficient as WGL+P, in 
neither time nor space. Por example, for WGL+LRU the average elapsed time 
of fhe EGL and LLL benchmark is 81 s and 94 s, respectively, with an average 
memory usage of 678 MiB in both cases. By contrast, WGL+P achieves an av¬ 
erage runtime of less than 7 s (and so WGL+P is one order of magnitude faster 
than WGL+LRU) and consumes even less memory on average (401 MiB) than 
WGL+LRU. The higher average runtime of WGL+LRU in the EGL benchmark 
is due to a single check that took several orders of magnifude longer (3068 s) 
than the remaining checks (20 s on average when the 3068 s outlier is excluded). 
In the LLL benchmark there are two such outliers (2201 s and 675 s, whereas the 
other checks average 27 s). The observed difference between WGL+LRU and 
WGL+P is even more pronounced in both the LSL and OPTIMIST benchmarks 
where the LRU cache eviction causes 14% and 9% of runs to timeout, whereas 
the WGL+P algorithm always runs to completion in less than a few seconds. 


Benchmark name 

Mnemonic II Benchmark name 

Mnemonic 

collision resistance lazy skip list 

CRLSL 

lock-free linked-list 

LFLL 

collision resistance lock-free skip list 

CRLFSL 

lock-free skip list 

LFSL 

fine-grained lock 

FGL 

lock-free skip list faulty (bad hash) 

LFSLFO 

lazy linked-list 

LLL 

lock-free skip list faulty (good hash) 

LFSLFl 

lazy skip list 

LSL 

optimistic lock 

OPTIMIST 


Table 2: Mnemonics for Lowe's implementation of concurrent sets [5] 


This experimentally confirms that the WGL+P is one order of magnitude 
faster as well as more space efficient than the baseline and WGL+P consumes 
even less space than our WGL+LRU implementation. 

6 Related work 

Linearizability is related to the concept of atomicity, including weaker forms 
such as fc-atomicity [11]. An important difference is that atomicity is typically 
not defined in terms of a sequential specification, e.g. [12]. The theoretical limi¬ 
tations of automatically verifying linearizability are well understood. Of course, 
the problem is generally undecidable [13]. In fact, even checking finite-state im¬ 
plementation against atomic specifications, provided the number of program 
threads is bounded, is EXPSPACE [14]. And the best known lower bound for 
this problem is PSPACE-hardness. This explains the restrictions in this paper 
and its focus on runtime verification instead. 

The literature on machine-assisted techniques for checking linearizability 
can be broadly divided into simulation-based methods (e.g. [15,16]), model 
checking (e.g. [17,18,19,20]), static analysis (e.g. [21,22,23,24]) and fully auto¬ 
matic testing (e.g. [4,25,26,27,28,29,30,5]). The simulation-based methods have 
been used by experts to mechanically verify simple fine-grained and lock-free 
implementations. Model checking requires less expertise but is typically lim¬ 
ited to very small programs and a small number of threads due to the state 
explosion problem. By contrast, static analysis tools aim to prove correctness 
with respect to an unbounded number of threads. In general, these techniques 
are necessarily incomplete and require the user to supply linearization points 
and/or invariants. Vafeiadis [24] proposes a more automatic form of static anal¬ 
ysis that works well on simpler concurrent data types such as stacks but report¬ 
edly not so well on data types that have more complicated invariants, including 
the CAS-based and lazy concurrent sets extensively studied in our experiments. 

Our work is most closely related to linearizability testing techniques that are 
precise, fully automatic and necessarily incomplete, e.g. [4,25,26,27,28,29,30,5]. 
We focus our discussion on tools that do not require the notion of commit 
points, cf. [31]. The work in [25,30] checks k-atomicity with a pol 5 momial-time 
algorithm assuming that each write to a register assigns a distinct value. By 
contrast, we solve a more general NP-complete problem of which k-atomicity 
is an instance. The tool in [26] analyzes code that uses concurrent collection data 
types such as maps. To make the analysis scale, the authors assume that the col¬ 
lection data types are linearizable, whereas our tool could be used to check such 









an assumption. A different tool [27] requires programmers to armotate concur¬ 
rent implementations with so-called state summary functions that act as a form 
of specification. Our approach is more modular because it strictly separates the 
concurrent implementation from its specification. By contrast, [28] works with¬ 
out the programmer having to provide a sequential specification. As a result, 
however, the tool can only find linearizability violafions when an exception is 
thrown or a deadlock occurs. Subsequent work [29] circumvents this, in the 
context of object-orienfed programs, by considering the special case of a su¬ 
perclass serving as an executable, possibly non-deterministic, specification for 
all its subclasses. The fact that the superclass can be non-deterministic may ex¬ 
plain why even checks of two threads can take a significant amount of time (e.g. 
108 min) despite the fact that each concurrent test considers only two possible 
linearizations [29]. By contrast, the WGL algorithm [4,5], on which our decision 
procedure is based (§ 4), is significantly faster but limited to deterministic spec¬ 
ifications. Crucially, our experiments (§ 5) with P-compositional specifications 
show a significant improvement over the WGL algorithm. 

7 Concluding remarks 

We have presented a precise, fully automatic runtime verification technique 
for finding linearizability bugs in implementations of concurrent data types 
that are expected to satisfy a P-compositional specification. Our experiments 
show that our partitioning scheme improves the WGL algorithm [4,5] by one 
order of magnifude, in both time and space. An additional strength of our tech¬ 
nique is that it is applicable to any linearizability checker. For this, however, 
our work assumes that the specification is P-compositional. This is generally 
not always the case and it would be therefore interesfing to further generalize 
P-compositionality, perhaps with a less modular partitioning scheme that can 
make more assumptions about the underlying decision procedure. 

Acknowledgements. We would like to thank Gavin Lowe, Kyle Kingsbury and 
Alexey Gotsman for invaluable discussions. 
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